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Abstract 

Social media, such as blogs, are often seen as democratic entities that allow more voices to be heard than the 
conventional mass or elite media. Some also feel that social media exhibits a balancing force against the arguably 
slanted elite media. A systematic comparison between social and mainstream media is necessary but challenging 
due to the scale and dynamic nature of modem communication. Here we propose empirical measures to quantify 
the extent and dynamics of social (blog) and mainstream (news) media bias. We focus on a particular form of bias — 
coverage quantity — as applied to stories about the 1 1 1th US Congress. We compare observed coverage of Members 
of Congress against a null model of unbiased coverage, testing for biases with respect to political party, popular front 
runners, regions of the country, and more. Our measures suggest distinct characteristics in news and blog media. A 
simple generative model, in agreement with data, reveals differences in the process of coverage selection between 
the two media. 



"In the end, we'll have more voices and more options." 

- Dan Gillmor, We the media 

1 Introduction 

Gillmor |T| envisioned social media, powered by the 
growth of the Internet and related technologies, as a 
form of grassroots journalism that blurs the line be- 
tween producers and consumers and changes how in- 
formation and opinions are distributed. He argued that 
"the communication network itself will be a medium 
for everyone's voice, not just the few who can af- 
ford to buy multimillion-dollar printing presses, launch 
satellites, or win the government's permission to squat 
on the public airways." This view has been embraced 
by activists who consider social media as a balancing 
force to the conventionally assumed slanted or biased 
elite media. Indeed, social media can be used by under- 
privileged citizens, promising a profound impact and a 
healthy democracy. 

Many believe that the mainstream media is slanted, 
but disagree about the direction of slant. The conven- 
tional belief about media bias has held for decades, 



but attempts at developing objective measurement have 
only recently begun. The study by Groseclose and Mi- 
lyo |l2l showed the presence of bias in mass media 
(cable and print news) and new media (Internet web- 
sites, etc.). Their results, despite receiving criticism, 
are fairly consistent with conventional wisdom. On the 
other hand, researchers have observed an "echo cham- 
ber" effect within the new media - people select par- 
ticular news to reinforce their existing beliefs and atti- 
tudes. Iyengar and Hahn ||3l argued that such selective 
exposure is especially likely in the new media environ- 
ment due to information overload. With search, filter- 
ing, and communication technologies, people can eas- 
ily discover and disseminate information that are sup- 
portive or consistent with their existing beliefs. 

Do social media exhibit more or less bias than mass 
media and, if so, to what extent? Identifying media 
bias is challenging for a number of reasons. First, bias 
is not easy to observe. It has been recognized that "bias 
is in the eyes of the beholder" meaning that, e.g., con- 
servatives tend to believe that there is a liberal bias in 
the media while hberals tend to believe there is a con- 
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servative bias 12113. Hence, finding textual indicators 
of bias is difficult, if not impossible. Second, the as- 
sessment of bias usually implies knowing what "fair- 
ness" would be, which may not be available or consis- 
tent across different viewpoints. Third, Internet-based 
communication promises easy, inexpensive, and instant 
information distribution, which not only increases the 
number of online media outlets, but also the amount 
and frequency of information and opinions delivered 
through these outlets. The scale and dynamic nature of 
today's communication should be accounted for. 

In this paper, our major contribution is that we pro- 
pose empirical measures to quantify the extent and dy- 
namics of "bias" in mainstream and social media (here- 
after referred to as News and Blogs, respectively). Our 
measurements are not normative judgment, but exam- 
ine bias by looking at the attributes of those being men- 
tioned, against a null model of "unbiased" coverage. 
We focus on the number of times a member of the 
111th US congress was referenced, and study the dis- 
tribution and dynamics of the references within a large 
set of media outlets. We consider "the unbiased" as a 
configurable baseline distribution and measure how the 
observed coverage deviates from this baseline, with the 
measurement uncertainty of observations taken into ac- 
count. We demonstrate bias measures for slants in fa- 
vor of specific political parties, popular front-runners, 
or certain geographical regions. Using these measures 
to examine newly collected data, we have observed dis- 
tinct characteristics of how News and Blogs cover the 
US congress. Our analysis of party and ideological bias 
indicates that Blogs are not significantly less slanted 
than News. However, their slant orientations are more 
sensitive to exogenous factors such as national elec- 
tions. In addition, blogs' interests are less concentrated 
on particular front-runners or regions than news out- 
lets. 

While our measures are independent of content, we 
further investigate two aspects of the content related to 
our measures: the hyperlinks embedded in articles and 
sentiments detected from the articles. The hyperlink 
patterns suggest that outlets with a Democrat-slant (D- 
slant for short) are more likely to cite each other than 
outlets with a Republican-slant (R-slant). The senti- 
ment analysis suggests there is a weak correlation be- 
tween negative sentiments and our measures. 

To better understand the distinctive slant structures 
between the two media, we propose to use a sim- 



ple "wealth allotment" model to explain how legisla- 
tors gain attention (references) from different media. 
The results about blog media's inclination to a rich- 
get-richer mechanism indicates they are more likely to 
echo what others have mentioned. This observation 
does not contradict our measures of bias - compared 
with news media, blogs are weaker adherents to par- 
ticular parties, front-runners or regions but are more 
susceptible to the network and exogenous factors. 

The rest of this paper is organized as follows. We 
first discuss related work, followed by the details of 
our collected data. We then detail the different types of 
coverage bias and how to quantify them and then ex- 
amine the results, both structurally (via hyperlinking) 
and textually (via text-based sentiment analysis). Fi- 
nally, we present a simple generative model of media 
coverage and conclude with a discussion of open issues 
and future work. 

2 Related Work 

Concerns about mainstream media bias have been a 
controversial and critical subject in journalism due to 
the media's power to shape a democratic society. Stud- 
ies on media bias can involve surveys and interviews 
|[5l . and content analysis ||6|, as well as theoretical 
models such as structural economic causes. Apart from 
these qualitative arguments, Groseclose and Milyo [2] 
proposed a media bias measure that counts how often 
a particular media outlet cites various think tanks and 
policy groups. 

There have been controversial responses to prior 
studies, and the origin in part lies in the difficulty to 
separate the recognition of bias from the belief of bias. 
A dependence on viewers' beliefs has been observed in 
studies [2,4], which is relevant to the theories on how 
supply-side forces or profit-related factors cause slants 
in media EKH. Because of such a dependency, compu- 
tationally identifying bias from media content remains 
an emerging research topic, and requires insights from 
other language analysis studies such as sentiment anal- 
ysis in or partisan features in texts |[T0l [8l. 

While mass media have the ability to affect the pub- 
lic's interests, social media represent large samples of 
expression from both influencers and those being influ- 
enced. Hence the "crowd voice" collected in social me- 
dia has attracted considerable research. The viral be- 
havior and predictive power of social media in response 
to politics, the economy and other areas has been exam- 



2 



ined in recent studies ||TTl|T2l. For example, Leskovec 
et al. fTP\ tracked the traversal of "memes" based on 
short distinctive phrases echoed by online news and 
blogs over time. Another work by O'Connor et al. lfT2l 
studied the relationship between tweet sentiments and 
polls in order to examine how the sentiments express- 
esed in the Twitter microblogging social media can be 
used as political or economic indicators. 

In this paper, we do not attempt to tackle the compu- 
tationally difficult task of identifying bias in media text. 
Instead, we study the characteristics of the two media 
based on purely quantitative measures independent of 
media content. We are interested in studying the role 
of today's social media, and we hope our analysis will 
contribute to the growing understanding of this subject. 

3 Data Model 

3.0.1 Data Collection 

Our data is based on RSS feeds aggregated by Open- 
Congresf ' ' OpenCongress is a non-profit, non- 
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partisan public resource website that brings together 
official government data with timely information about 
what is happening in Congress. We continuously mon- 
itor and collect the OpenCongress RSS feeds for each 
individual member of Congres^ This paper exam- 
ines News and Blogs coverage about the 111th US 
Congress, both Senators and Representatives. The 
dataset spans from September 1 to January 4, covering 
the 2010 mid-term election on November 2. 

Figure [T] shows the volume (total number of news 
articles or blog posts) over time in this dataset. The 
central peak corresponds to the mid-term election. In 
total, there are 57,221 news articles and 66,830 blog 
posts being collected in the four-month period. 

3.0.2 Networked Data Model 

We study the structure of the two media by constructing 
a modal network containing different types of nodes 
and edges. The network structure is illustrated in 
Fig. [2] More specifically, we have: 



www . opencongress . org 



^OpenCongress uses Daylife (www.daylife.coml and 
Technorati (technorati . com) to aggregate articles from these 
feeds. The possible selection biases in these filtering processes are 
not considered in this paper. 

'An example news/blog coverage feed can be found at 
[http : //www. opencong ress . o rg/people/news, 
blogs/300075_Lisa_Murkowski 




Figure 1 : The volume (total number of news articles or blog 
posts) over time. The highest peak corresponds to the mid- 
term election. 



Nodes There are three sets of nodes: a news set, de- 
noted by Vn, that contains 5,149 news outlets, a 
blog set Vb of 19,693 blog^ and a legislator set 
Vl that covers 530 lawmakers. 

Edges Each edge eik records when media outlet i pub- 
lishes an article referencing legislator k. We ex- 
tract 64,222 such edges in 46,501 news articles, 
denoted as edge set -Enl. and 91,837 edges in 
62,301 blog posts, denoted as E'bl- Edges are as- 
sociated with timestamps and texts. 

Node attributes For legislators, we record attributes 
such as party, district, etc., based on the legisla- 
tors' profiles and external data sources. 

While we focus on "reference" or citation edges, this 
networked model can also include other types of edges, 
e.g. hyperlinks between outlets, voting preferences 
among legislators, etc. 

4 Types of Bias 

In journalism, the term "media bias" refers to the se- 
lection of which events and stories are reported and 
how they are covered within the mass media. The 
most commonly discussed biases include reporting that 
supports (or attacks) particular political parties, candi- 
dates, ideologies, corporations, races, etc. In this paper, 
we begin with perhaps the simplest form of measurable 
bias - the distribution of coverage quantity, i.e. how 
many times an entity of interest is referenced by a me- 
dia outlet. We argue that, regardless of a positive or 



We also have a small number of blogs hosted by mass media 
news outlets, e.g. CNN (blog). This paper does not include analysis 
of such blogs. 
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Figure 2: The networked data model. There are three types 
of nodes: news outlets, blog outlets and legislators. An edge 
pointing toward a legislator represents each time an outlet 
references that legislator in an article or post. 

negative stance towards an entity, an imbalanced quan- 
tity of coverage, if present, is itself a form of bia^ 

An outlet's references can be biased in a number of 
ways: 

Party References are focused on a particular political 
party. 

Front-runner References are concentrated on a few 
legislators who we term "front-runners", while the 
majority of legislators receive little or no atten- 
tion. 

Region References focus on certain geographical lo- 
cations. 

Ideology An ideology is a collection of ideas spanning 
the political spectrum. Ideological bias indicates 
that frequently referenced legislators favor certain 
ideological tendencies. 

Gender The preference towards covering legislators 
of one gender. 

We discuss how to measure different types of bias in 
a unified model. Other types of bias, such as those in 
favor of a particular race or ethnic group, can also be 
measured through our method. 

Based on the measurements associated with individ- 
ual media outlets, we derive system-wide bias mea- 
sures that allow us to characterize and compare the bias 
structure between the news and blog media. 

'Our view on the meaningfulness of a measurement based 
solely on quantity is similar to the study of Groseclose and Milyo 



5 Quantifying Bias 

In this section, we describe our method for quantifying 
and comparing bias in News and Blogs. 

5.0.3 Notation 

Let re^^ be number of times media outlet i references 
legislators in group k, where c G {News, Blogs} is the 
media category (c is omitted when there is no need to 
distinguish the categories). In the case of measuring 
party bias, k G {D, R} indicates the Democratic or 
Republican political parties. Let = J2k ^ifc 
total number of references made by outlet i. We be- 
gin with a specific case - measuring the two-party bias, 
and then describe a more general model for measuring 
other types of bias. 

5.1 Party Slant 

A naive approach for measuring an outlet's biased cov- 
erage of two political parties is to compare the number 
of times members in each party are referenced. The ra- 
tio of the reference counts of one party against the other 
may be used to compare outlets that reference different 
parties with different frequencies. There are two issues 
with this approach: (i) this ratio may lack statistical 
significance for some outlets, and (ii) it assumes that 
fair coverage of the two parties requires roughly equal 
quantities of references to each. 

To resolve these issues, we use the log-odds-ratio as 
follows. We define 9ik, the "slant score" of outlet i to 
party k, as 

Oik = log(odds-ratio) = log — — , (1) 

where pk is the baseline probability that i refers to k, 
and here we assume this variable is fixed for all i. The 
advantage of having such a baseline probability is that 
"fairness" become configurable. For example, one can 
consider fairness as a 50-50 chance to reference either 
party (i.e. pd = PR = 0.5). One can also define 
Pd = 0.6 since roughly 60% of the studied legislators 
are Democrats. No matter what baseline probability is 
given, we have a simple interpretation: = means 
no bias w.r.t that baseline. In this two-party case, we 
take 6i = Oik, with k = Y), and 0j > means outlet i is 
more likely to be D-slanted. A slant score with value a 
can be interpreted as follows: the number of times out- 
let i references Democratic legislators is e" times more 
than if those references followed the baseline. 
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Party slant score 9 Front-runner slant score 9 



Figure 3: The scatter plot of number of references (obser- 
vations) against party (left) and front-runner (right) slant 
scores for News and Blogs. Outlets with less than 20 articles 
are not shown. 



Table 2: The collective slant scores. Parenthetical values 
indicate standard deviation of the measured slant score. 



House Senate 







Scon 


Spop 


Scon 


Spop 


Party 


News 
Blogs 


-0.02 (0.02) 
-0.11 (0.02) 


-0.06 (0.02) 
-0.15 (0.02) 


-0.22 (0.03) 
-0.18(0.04) 


-0.45 (0.04) 
-0.41 (0.04) 


Ideology 


News 
Blogs 


-0.05 (0.02) 
-0.16(0.02) 


-0.08 (0.02) 
-0.19(0.02) 


-0.19(0.04) 
-0.12(0.04) 


-0.45 (0.04) 
-0.39 (0.04) 


Gender 


News 
Blogs 


-0.26 (0.04) 
-0.29 (0.04) 


0.07 (0.03) 
0.03 (0.04) 


-0.28 (0.06) 
-0.32 (0.07) 


0.45 (0.05) 
041 (0.06) 


Front- 
rtinner 


News 
Blogs 


0.68 (0.01) 
0.33 (O.OI) 


0.60 (0.01) 
0.23 (0.01) 


0.66 (0.02) 
0.39 (0.02) 


0.55 (0.03) 
0.29 (0.03) 


Region 


News 
Blogs 


0.97 (O.OI) 
0.61 (O.OI) 


-0.13 (0.01) 
-0.21 (0.02) 


0.76 (0.01) 
0.44 (0.02) 


0.45 (0.03) 
0.18(0.03) 



Table 1: Slant scores 9 for major news outlets and most 
slanted blogs. For party slant, a positive (negative) score 
means the outlet is likely to be D-slanted (R-slanted). For 
front-runner and regional slant, a larger score indicates the 
outlet is more focused on few particular legislators or states. 





Party {9) 


Front-runner [0 ) 


Region {0) 




nbc (0.51) 


Washington post (1.03) 


los angeles times (1.30) 




new york times (0.07) 


cnn (1.02) 


nbc (1.19) 




Washington post (-0.01) 


fox (0.91) 


cbs (1.12) 




abc (-0.03) 


wall street journal (0.86) 


cnn (1.04) 




cbs (-0.03) 


cbs (0.84) 


Washington times (1.00) 




los angeles times (-0.07) 


nbc (0.83) 


U.S. news (0.98) 




newshour (-0.10) 


los angeles times (0.82) 


wall street journal (0.96) 


1 


cnn (-0.11) 


msnbc (0.74) 


usa today (0.96) 


z 


fox (-0.13) 


U.S. news (0.71) 


Washington post (0.95) 




npr (-0.14) 


new york times (0.70) 


msnbc (0.92) 




wall street journal (-0.15) 


Washington times (0.70) 


npr (0.92) 




U.S. news (-0.22) 


usa today (0.66) 


new york times (0.89) 




bbc (-0.38) 


npr (0.64) 


abc (0.87) 




usa today (-0.39) 


abc (0.61) 


fox (0.84) 




msnbc (-0.39) 


newshour (0.32) 


newshour (0.78) 




Washington times (-0.96) 


bbc (0.00) 


bbc (0.20) 




dissenting times (5.22) 


arlnow.com (9.41) 


blue jersey (8.32) 




cool wicked stuff (3.89) 


janesville (9.05) 


[...] Virginia politics (7.86) 


S 


justicedeniedl3501 (3.58) 


take back idaho's [...] (8.84) 


politics on the hudson (7.34) 


polifrog.com (3.54) 


moral science club (8.84) 


calwatchdog (7.23) 




dennis miller (3.46) 


murray for congress (8.67) 


staradvertiser [...] (7.19) 



The slant score's variance is given by the Mantel- 
Haenszel estimator lfT3l : 

Var(^i) = — + ^ + — + — ^ -. (2) 

riik rii-riik UiPk ni{l - pk) 

The variance gives the significance of the slant score 
measure, which relies on the number of observations 
(ui and riik) we have for each outlet. 

Figure |3] (a) shows the number of references as a 
function of party slant scores for outlets with more than 
20 articles in our dataset. The distribution of outlets' 
slant scores appears to be roughly symmetric in both 
directions, and outlets making more references tend to 
be less slanted. Table [T] lists the slant scores for some 
major news outlets and the most slanted blogs. 



5.1.1 Summary statistics 

In order to characterize the overall bias within a me- 
dia, we derive a system-wide bias measure based on 
the individual outlets' measures. We use a random ef- 
fect model, which assumes not only variation within 
each outlet, but also variation across different outlets in 
the system. More specifically, the model assumes that 
the slant scores for n outlets (^i, . . . , On) are sampled 
from M{6, r^), and there are two sources of variation: 
the variance between outlets and the variance within 
outlets a^. Hence, the model is given by 

Oi^Mie^a^ + 7"^). (3) 

We use the DerSimonian-Laird estimator |[T4l to ob- 
tain 6* and Var(0*), where 6* is the asymptotically 
unbiased estimator for 9. The media-wide collective 
party slant score, 0, is defined as = 0* with a 
ibl.96-y/Var(6'*) confidence interval. 

Table |2] summarizes slants with respect to different 
baselines. The measure 0con is based on the party 
composition of members in Congress, and ©pop is 
based on the fraction of the US population represented 
by the legislators (in each party). The statistical signif- 
icance of each measure is represented by the variance. 
Note that in this two-party case, a different baseline can 
be obtained simply by shifting the score. For example, 
if one chooses to use = PR = 0.5 as the baseline 
probability, the measure ©0.5 can be calculated from 
©con by adding log( ) 0.405 (where in terms of 
Congress composition p-Q 0.6). 

We also separate our measures for referencing mem- 
bers of the House and Senate to see if outlets ex- 
hibit different slants when covering the two chambers. 
Evaluated on the party percentage baseline, both me- 
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dia show R-slant when referencing Senators, and blogs 
are more R-slanted when referencing members of the 
House. Hence Blogs are overall more R-slanted than 
News. This interpretation depends on what baseline 
is chosen, however. For example, if we choose to use 
the 50-50 convention, both media become D-slanted. 
However, it is important to note that the absolute dif- 
ference between the bias measures for the two media 
do not change with baseline. 

5.2 Slant Dynamics 

To study how media bias may change over time, we 
calculate the slant scores using references made during 
running windows. We measure Q{t, w) as a function of 
time t and window length w. Figure |4] shows the tem- 
poral slant scores for the two media during the four- 
month period, based on a. w = 2-week running win- 
dow. The slant of both media changes slightly after the 
mid-term election: Compared with their pre-election 
slants. News become slightly more R-slanted when ref- 
erencing Senators and Blogs are more R-slanted when 
referencing Representatives. Overall, the media, es- 
pecially Blogs, become more R-slanted after election. 
This is reasonable due to the Republican victories. 

These results raise an important question: do the 
majority of outlets become more R-slanted after the 
election, or do R-slanted outlets become more active 
while D-slanted outlets become quieter? To examine 
what caused the slant change we plot in Fig. [5] the 
change in slant score A9i = 6i{t2) — Oi{ti), where 
ti G [Sep. 1, Oct. 30] and t2 G [Nov. 7, Jan. 4], for 
each outlet against its slant score before the election. 
(Point size indicates the amount of references observed 
after the election.) We use a linear regression to quan- 
tify the slant change. Surprisingly, we see media out- 
lets shifted slightly toward the other side after the elec- 
tion regardless of their original slants, but overall the 
originally D-slanted outlets become more R-slanted. 

5.3 Front- Runner Slant 

To evaluate whether or not the media pay exces- 
sive attention on popular front-runners, we extend the 
dichotomous-outcome measure used in the previous 
section. We consider a generalization of the odds ra- 
tio proposed by Agresti ifTSl . 

Let nlj^ now be the number of times outlet i refers 
to the A;-th legislator, where c G {News, Blogs} as be- 
fore, and k G {1, 2, L} is the rank index for one 
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Figure 4: Slant score as a function of time. Overall, the 
media, especially Blogs, become more R-slanted after the 
2010 election. 





-2 2 

slant score 9 



-2 2 

slant score 9 



Figure 5: Media outlets are slightly shifting towards the 
other side after election. The majority of news outlets be- 
come slightly more R-slanted. For blogs, originally D- 
slanted blogs become more R-slanted. Each point represents 
a media outlet. 



of the L legislators, ordered by the number of refer- 
ences received from outlet i. We can replace nik by the 
sample proportion = nj^/nj. The slant score 9i of 
outlet i is defined by a generalized log-odds-ratio: 



Oi = log 



J2j<kPikPj ^ 



(4) 



where pj is, again, the baseline probability that i refers 
to the j-th legislator, and the {pj} can be chosen to be 
uniform or any other distribution. For convenience we 
commonly fix the baseline distribution for all i. 

When L = 2, Eq. |4] reduces to a dichotomous- 
outcome log-odds-ratio measure similar to Eq. [T] 
When L > 2 and the {pj} are not uniform, changing 
to a different baseline is not a simple linear shift. With 
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Eq. |4j a slant score with value a can be interpreted as 
follows: the number of times outlet i mentions high 
ranked legislators is e° times more than if the legisla- 
tors were ranked according to their baseline probabili- 
ties. 

The variance in the slant score is now given by |[T51I : 

Var(6'i) = (5) 

where 

aij = 9i'^pk- '^Pk, Pij = Oi ^Pifc - y^m- 

k<j k>j k>j k<j 

Figure |3] (b) plots the number of references (obser- 
vations) against front-runner slant scores for media and 
blog outlets with more than 20 posts in our dataset. We 
expect the frontrunner slant scores to be mostly posi- 
tive since the legislators are already ranked by popular- 
ity {riik). 

The system-wide frontrunner slant score for both 
news and blog media can be calculated as before. Table 
|2] summarizes front-runner slants with respect to vari- 
ous baselines. Note that the two media show differ- 
ent biases when referencing the two chambers: Blogs 
are more front-slanted than news about Senators, while 
news outlets are more front-slanted when referencing 
Representatives. 

5.4 Other Types of Slant 

5.4.1 Ideology 

The concept of ideology is closely related to that of po- 
litical party - members of the same party usually share 
similar or less contradictory ideologies. We study the 
ideological bias using a method similar to the party 
slant analysis. We first locate each legislator relative 
to an identifiable ideological orientation such as left or 
right, and then use the dichotomous-outcome measure 
to obtain ideological slant scores for individual outlets 
as well as system-wide scores for News and Blogs. 

We use the DW-NOMINATE scores for the 
U.S. Congress [16| as measures of legislators' ideo- 
logical location^ The estimates are based on the his- 
tory of roll call votes by the members of Congress 

*Based on their method, each member's ideological point is es- 
timated along two dimensions. Previous research has shown that 
- the first dimension reveals standard left-right or economic cleav- 
ages, and the second dimension reflects social and sectional divi- 
sions. In this paper we use only the first dimension. 



and have been widely used in political science stud- 
ies and related fields. We classify each legislator as 
either ideologically-left or -right, based on the sign of 
their estimate^ We then calculate the ideological slant 
score 9ik, k G {Left, Right} for each outlet i with 
k = Left so that > indicates outlet i is more likely 
to be Left- slanted. 

Our ideological slant measurements are also summa- 
rized in Table [2] We find this measure is highly corre- 
lated with the party slant measurement (with Pearson 
correlation r = 0.958 and p < 10~^). This suggests 
that, while party members may be found at different 
positions in the left-right spectrum, media outlets tend 
to pick legislators who are representatives of the two 
parties' main ideologies, such as Left-wing Democrats 
or Right-wing Republicans. 

5.4.2 Gender 

Gender is also treated as a dichotomous variable, where 
> indicates that the coverage of outlet i favors 
male legislators. The results, summarized in Table [2] 
show that blogs have a slightly stronger female-slant 
than news. However, when considering the population 
baseline, the slant for both media is significant for the 
Senate but nearly insignificant for the House. The gen- 
der composition in both chambers is similar - 20% of 
the members are women. The differences in the esti- 
mates based on different baselines reflect a very dif- 
ferent voter population represented by the female/male 
legislators in both chambers. 

5.4.3 Region 

We consider region as a categorical variable. For each 
legislator, the state or territory of his or her district 
is used. The region slant is calculated like the front- 
runner slant: the slant score 9i is defined as per Eqs. [4] 
and|5} where k G {1, 2, S} is the rank index for one 
of the S states in the US, ordered by the number of ref- 
erences received from outlet i. The results are again 
summarized in Table [2) Overall, news outlets show a 
much stronger regional bias than blogs. The negative 
slant scores in the House, based on the population base- 
line, indicate outlets' favor those representatives from 
more populous states. 



'Estimates for the 111th Congress are available at: Ihttp : / / 
voteview . spia . uga . edu/ dwnomin . htm. 
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6 Examining Coverage 

As mentioned earlier, the slant scores of media outlets 
are calculated based only on the quantity of references 
to legislators, and are independent of the coverage con- 
tent. In this section, we examine two intrinsic aspects 
of this coverage, the hyperUnks between outlets and the 
sentiments of the textual content, as related to the party 
slants. 

6.1 Links 

We extract the hyperlinks embedded in each news arti- 
cle or blog post and study how media outlets with dif- 
ferent slants link to one another. Using the sign of 
the party slant score 9p, we divide News and Blogs 
into four sectors: D-slanted news, R-slanted news, D- 
slanted blogs, and R-slanted blogs. 

Table[3]shows the prevalence of links among the four 
sectors. Each entry represents the total number 
of hyperlinks from outlets in category i pointing to the 
articles of outlets in category j. The linking pattern 
exhibits interesting phenomena: first and the most ob- 
vious characteristic between the two media is that news 
outlets have far fewer hyperlinks in their articles com- 
pared with blog posts. Blogs with more hyperlinks can 
also be seen as second-hand reporters or commenta- 
tors in response to some news articles and other blog 
posts. Second, articles in the D-slanted outlets, includ- 
ing news and blogs, are more likely to be cited, includ- 
ing by outlets with the opposite slant. For example, the 
R-slanted blogs have a large number of hyperlinks to 
the D-slanted news outlets. Third, the matrix shows a 
strong assortativity [ 17] in the D-slanted community - 
the D-slanted blogs are more likely to cite articles from 
D-slanted news and blogs than the R-slanted blogs are 
to cite R-slanted news and blogs. In fact, linking pat- 
terns among the R-slanted community appear to be dis- 
assortative. It would be interesting to compare our re- 
sults with those of Adamic, et al. ifTSl . 

6.2 Texts 

Our slant estimation is based on how many times an 
outlet references a legislator, regardless of positive or 
negative attitude. Without any sentiment information, 
the estimated scores need to be interpreted carefully: 
a significant slant score only reflects the existence of 
bias, but not the polarity (if any) of such bias. This sub- 
section describes our attempt to study sentiment infor- 
mation within the media. We employ the OpenAmplify 



Table 3: The strength of hyperlinks among News and Blogs 
with Democrat or Republican slants. Each entry (i, j) rep- 
resents the total number of hyperlinks from category iXo j. 
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Party slant score 9 
Figure 6: Joint probability density for negative sentiment 
and party slant score. Solid line is the averaged trend. We 
see that D-slanted media are positively correlated with 
while R-slanted media are negatively correlated (r: corre- 
lation coefficient; p: p-value). 



API^to extract the sentiment information of each ref- 
erence. The APIs return, for each article, the detected 
name entities and the sentiment values associated with 
the entities. We derive sentiment information for (out- 
let, legislator) pairs by matching legislator names to the 
names detected in each article, then aggregate the sen- 
timent scores associated with these legislators over all 
of the outlet's articles. The sentiment scores for par- 
ties can be derived from the scores received by party 
members. 

Figure [6] shows the probability density of the resul- 
tant negative sentiment scores against the party slant 
scores. The results show a weak correlation between 
sentiment values and the party slant scores. Outlets' 
sentiments for Democratic legislators are positively 
correlated to their slant scores, while sentiments for 
Republican legislators are negatively correlated. This 
suggests the outlets with slants to a particular party 
tend to mention that party less negatively. Then ten- 
dency is easier to discover in Blogs than in News, but 
this can be caused by differences in the use of language 
rather than the level of bias. 



http : / /community . openamplif y . com/ 
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7 Modeling the reference-generating 
process 

What are the underlying mechanisms governing how 
News and Blogs choose to reference legislators? Are 
there similarities or differences between these two me- 
dia? We propose to use a simple generative model |[T9l 
for the probability P{n) that a legislator is referenced a 
total of n times. Comparing the results of the model's 
isolated mechanism with the actual data will give intu- 
ition about factors contributing to the observed P{n). 

The model is as follows. Initially (t = 0), we as- 
sume]^ a single reference to some legislator k' such 
that nk{0) = S{k,k'), for all k. At each time step 
the media (News or Blogs) selects a random legisla- 
tor to reference in an article. With probability q, how- 
ever, the media rejects that legislator and instead ref- 
erences a legislator with probability proportional to his 
or her current coverage. That is, at each time step t, 
nk{t + 1) = nk{t) + 1 occurs with probabihty pkit): 

/ VI^lI withprob. 
\'"'k{t)/^f,,nk'{t) withprob. g. 

This captures the intuitive "rich-get-richer" notion of 
fame, while the parameter q tunes its relative strength. 
Those legislators lucky (or newsworthy) enough to be 
referenced early on are likely to become heavily ref- 
erenced, since they have more opportunities to receive 
references, especially as q increases. Since one ref- 
erence is handed out at each timestep, the total num- 
ber of references measured empirically fixes the times- 
pan over which the model is run; is also fixed, 
so the model has one parameter, q. Asymptotically 
(|^l| — ^ oo), this model gives a pure power law 
P{n) ~ n-i- for all g > Gil. The distribution of 
n is more complex for finite \ Vl\, however, obtaining 
a gaussian-like form for g < 1/2 and a heavy-tailed 
distribution for g > 1/2. 

Figure [vjcompares the observed P{n) with that gen- 
erated using the model process. We observe good 
qualitative agreement, better than fitted poisson or log- 
normal distributions, although there is a slight tendency 
to overestimate popular legislators and underestimate 
unpopular legislators. The empirical distributions also 
exhibit a slight bimodality, perhaps due to the 2010 

'This initial condition differs from tlie flat start of Bagrow, et 
al. fT9i , with important consequences for finite-time models. 




n n 



Figure 7: The generative model for the distribution of refer- 
ences n per legislator. The larger value of q for Blogs indi- 
cates that they are more driven by the rich-get-richer mech- 
anism than News, although both distributions are heavy- 
tailed. Dashed lines indicated fitted poisson and log-normal 
distributions, for comparison. 

election, that is not captured by the model. The larger 
value of q for Blogs than for News provides evidence 
that Blogs collectively are more driven by a rich-get- 
richer selection process than News, although this may 
not hold at the individual outlet level. 

The measures of front-runner slant indicate that 
News have a stronger front-runner bias than Blogs. 
This seems to conflict with the reference generating 
model, which showed that blog behavior is more ex- 
plainable by the rich-get-richer mechanism (q is larger 
for Blogs than for News). However, we argue that the 
measures and the model are in fact consistent, since 
the model only treats the aggregate of the entire media 
class - the stronger front-runner bias in News outlets 
means that each outlet is more likely to reference their 
own intrinsic set of front-runners, which may be dif- 
ferent from others'; for Blogs, the "stickiness" of their 
individual set of front-runners is weaker and hence over 
time globally popular front-runners are more likely to 
emerge. Further examination of this argument would 
be to explicitly model the bias of individual outlets. 

This one-parameter model neglects a number of dy- 
namical features that may be worth future pursuit. For 
example, generalizations may be able to explain tem- 
poral dynamics of the references, the joint distributions 
riik between media outlet i and legislator k, etc. 

8 Discussion and Open Issues 

Our results show that News and Blogs, in aggregate, 
have only slightly different slants in terms of party and 
ideology. However, the dynamics of the party slant 
measures suggest blogs are more sensitive to exoge- 
nous shocks, such as the mid-term election. Our obser- 
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vations were made over a short, four-month timeframe, 
yet long-term, continuous tracking of slant dynamics 
would be necessary to reveal any consistently different 
dynamical behavior between the two media. 

Our measures and model are solely based on the 
quantity of coverage. We have conducted preliminary 
sentiment analysis using an off-the-shelf tool and com- 
pared the extracted sentiment results with our mea- 
sures. The results suggest a weak connection between 
the quantity and semantics of referencing a subject. It 
would be worth investigating the accuracy of sentiment 
detection on different media content and how sentiment 
analysis can be used to identify bias from texts. In ad- 
dition, critical content analysis (which examines not 
only the text but also the relationship with audience) 
and multivariate analysis (since multiple types of slants 
are inter-related) may be leveraged for further analysis. 

9 Conclusion 

In this paper, we develop system-wide bias measures 
to quantify bias in mainstream and social media, based 
on the number of times media outlets reference to the 
members of the 111th US Congress. In addition to 
empirical measurements, we also present a generative 
model to explore how each media's global distribution 
of the number of references per legislator evolves over 
time. We observe that social media are indeed more 
social, i.e. more affected by network and exogenous 
factors, resulting in a more heavily-skewed and un- 
even distribution of popularity. Perhaps, there are more 
voices than ever, but many are echoes. 

We plan to continue work along the lines discussed 
in the previous section, such as long-term tracking of 
slant dynamics in the two media, modeling individ- 
ual outlets' biases, and leveraging content analysis and 
multivariate analysis. 
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